Skip to main content

Atmallen8's group workspace

Timestamps visible
2022-10-14 20:17:59
[2022-10-14 20:17:57,794] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/800M/zero_to_fp32.py
2022-10-14 20:17:59
[2022-10-14 20:17:57,798] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/800M/global_step141000/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-10-14 20:38:38
[2022-10-14 20:38:36,969] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32768.0, reducing to 32768.0
2022-10-14 20:38:40
[2022-10-14 20:38:38,759] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
2022-10-14 20:49:08
[2022-10-14 20:49:07,472] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/800M/zero_to_fp32.py
2022-10-14 20:49:08
[2022-10-14 20:49:07,502] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/800M/global_step142000/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-10-14 21:10:54
[2022-10-14 21:10:53,145] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32768.0, reducing to 32768.0
2022-10-14 21:10:56
[2022-10-14 21:10:54,930] [INFO] [stage1.py:697:step] [deepspeed] fp16 dynamic loss scale overflow! Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
2022-10-14 21:20:20
[2022-10-14 21:20:18,723] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/800M/zero_to_fp32.py
2022-10-14 21:20:20
[2022-10-14 21:20:18,726] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/800M/global_step143000/zero_pp_rank_24_mp_rank_00_optim_states.pt
2022-10-14 21:20:30
[2022-10-14 21:20:29,732] [INFO] [engine.py:1805:_copy_recovery_script] creating recovery script /fsx/hailey/pythia/ckpts/800M/zero_to_fp32.py
2022-10-14 21:20:30
[2022-10-14 21:20:29,735] [INFO] [engine.py:1818:_save_zero_checkpoint] zero checkpoint saved /fsx/hailey/pythia/ckpts/800M/global_step143000/zero_pp_rank_24_mp_rank_00_optim_states.pt